DOCUMENT RESUME 


ED 421 550 


TM 028 876 


AUTHOR 

TITLE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FROM 

PUB TYPE 
EDRS PRICE 
DESCRIPTORS 

IDENTIFIERS 


Glas, Cees A, W. 

Modification Indices for the 2 PL and the Nominal Response 
Model. Research Report 98-04. 

Twente Univ. , Enschede (Netherlands) . Faculty of Educational 
Science and Technology. 

1998-00-00 
42p . 

Faculty of Educational Science and Technology, University of 
Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. 
Reports - Evaluative (142) 

MF01/PC02 Plus Postage. 

Foreign Countries; ^Goodness of Fit; Item Response Theory; 

Maximum Likelihood Statistics; * * * Test Items 

^Nominal Response Model; Rasch Model; Two Parameter Model 


ABSTRACT 


In this paper it is shown that various violations of the two 
parameter logistic (2 PL) model can be evaluated using the Lagrange multiplier 
test (J. Aitchison and S. Silvey, 1958) or the equivalent difference score 
test. The tests focus on violation of local stochastic independence and 
insufficient capture of the form of the item characteristic curves. 

Primarily, the tests are item-oriented diagnostic tools, but taken together, 
they also serve the purpose of evaluation of global model fit. A useful 
feature of Lagrange multiplier statistics is that they are evaluated using 
maximum likelihood estimates of the null model only; that is, the parameters 
of alternative models need not be estimated. As numerical examples, an 
application on real data and some power studies are presented. (Contains 1 
figure, 9 tables, and 33 references.) (Author/SLD) 


* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 

********************+*****★*********★★ M ***************************************** 



o 

ir> 

iri 


Modification Indices for the 2PL 
and the Nominal Response Model 


Research 

Report 

98-04 


TNN 


□ 


Cees A.W. Glas 


U.S, DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 

EDUCATIONAL RESOURCES INFORMATIOr 
CENTER (ERIC) 

cT This document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 


• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 


permission to reproduce and 
disseminate this material has 
been granted by 


O THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 


<0 

r- 

CO 

CO 

CM 


faculty of 

EDUCATIONAL SCIENCE 
AND TECHNOLOGY 


o 

ERIC 


Department o 

Educational Measurement and Data Analysis 


2 



University of Twente 


Modification Indices for the 2-PL 
and the Nominal Response Model 

Cees A.W. Glas 



3 


Abstract 


In this paper, it is shown that various violations of the 2-PL model and the nominal 
response model can be evaluated using the Lagrange multiplier test or the equivalent efficient 
score test. The tests presented here focus on violation of local stochastic independence and 
insufficient capture of the form of the item characteristic curves. Primarily, the tests are item- 
oriented diagnostic tools, but taken together, they also serve the purpose of evaluation of global 
model fit. A useful feature of Lagrange multiplier statistics is that they are evaluated using 
maximum likelihood estimates of the null-model only, that is, the parameters of alternative 
models need not be estimated. As numerical examples, an application on real data and some 
power studies are presented. 

Key words: efficient score test, item response theory, model fit, modification indices, 
2-parameter logistic model, nominal response model, Lagrange multiplier test. 
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Introduction 

Interestingly, evaluation of model fit has a long tradition in the Rasch model (Andersen, 
1973, Martin Lof, 1973, 1974, Fischer, 1974, Kelderman, 1984, 1989, Molenaar, 1983, Glas, 
1988, 1997, Glas & \ferhelst, 1989, 1995) while contributions to the 2-PL model in this respect 
have been relatively few (Yen, 1981, Mislevy & Bock, 1990, Reiser, 1996, Glas, 1998). One 
of the reasons for this situation might be that the Rasch model and its variants have minimal 
sufficient statistics, which are very helpful for the derivation of the asymptotic distribution of 
the test statistics (see, for instance, Glas 1997). On the other hand, the 2-PL model is a more 
flexible model, so that the need for evaluation of model fit may be less stringent than in the 
case of the more restrictive Rasch model. However, also in the 2-PL model violations may 
occur which threaten the validity of the inferences made. In this paper, the focus will be on two 
violations: improper modeling of the form of the item characteristic curves (ICC’s) and lack of 
local stochastic independence. In many respects, the model tests proposed here can be viewed 
as generalizations of two tests for the Rasch model: the i?i-test for evaluation of the assumption 
with respect to the form of the ICC’s and the R 2 - test for evaluation of the assumption of local 
independence (Glas, 1988, 1997, Glas& \ferhelst, 1989, 1995). 

The procedures proposed here are based on the Lagrange multiplier (LM) statistic 
(Aitchison & Silvey, 1958), rather than on likelihood ratio tests and Wild tests. This choice 
is made because LM tests only need ML estimates of the parameters of the model of the null- 
hypothesis. In the present case the null-model will be the 2-PL model, its generalization to 
polytomous data, the nominal response model (NRM, Bock, 1972) and a special case of the 
latter model, the generalized partial credit model (GPCM) by Muraki (1992). Generalization 
of the approach presented here to the 3-PL model is beyond the scope of the present paper and 
will be treated in a subsequent paper. In many instances, the parameters of the model of the 
alternative hypothesis will be quite complicated to estimate. But even if this is not the case, the 
procedure proposed here has advantages. In the sequel, hypothesis related to specific model 
violations will be tested for one item or pair of items at a time. If this was done using a Wild or 
likelihood ratio test, this would require computing new estimates for every test. So primarily, the 
procedures are meant as item-oriented diagnostic tools. However, below it will also be shown 
that the ensemble of the computed statistics can also serve the purpose of a global test of model 
fit. 
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Preliminaries 

Consider items where the possible responses are be coded by the integers 0, 1, 2, 3, 
m*. Let item i have ra* + 1 response categories, indexed g = 0, 1, Notice that 

dichotomous items are the special case where m* = 1. The response of a person n to an item 
i will be represented by a vector x f ni = (x ni o, ...,x nig , where x nig is a realization of 

the random variable X nig \ x nig = 1 if the response is in category g and x nig = 0 if this is not 
the case. The probability of scoring in category g of item i is given by 


i> ig {6 n ) = Pr(X nig = 1 | oc exp(a ig d n -0 ig ), (1) 

for g = 0, 1, ..., rrii, with the usual restriction a i0 = (3 i0 = 0 to identify the model. Defining 
ip ig (6 n ) starting from g = 0 may seem a bit awkward here, but below it will prove very 
convenient. With the assumption of local independence between item responses, formulation 
(1) encompasses the 2-PL model for dichotomous items (Bimbaum, 1972), and the nominal 
response model (Bock, 1972, Thissen, 1991) for polytomous items. When the restriction 
a ig = goti is imposed, the model is the GPCM by Muraki (1992). 

To introduce the LM tests, first some theory on MML estimation for IRT models 
must be summarized. The choice of an ability distribution is not essential to the theory 
presented here; it can either be the parametric (see Bock & Aitkin, 1982) or the non- parametric 
MML framework (see De Leeuw & \brhelst, 1986, Follmann, 1988). However, to make the 
presentation specific, the parametric framework will be assumed, and ability will be normally 
distributed with parameters g and a. Further, for reasons of simplicity, it will first be assumed 
that all respondents belong to the same population and have responded to the same set of 
items. Modem software for the 2-PL model, such as Bilog-MG (Zimowski, Muraki, Mislevy, 
& Bock, 1996), also supports multiple populations and incomplete designs. Generalization of 
the methods to be presented to these specifications is straightforward and will be sketched in 
Section 7. Further, this software also supports Bayes modal estimation (Mislevy, 1986). This 
generalization will be discussed in Section 6. 

Let g ( .; g, a) be the density of 6 n . Since only one population is considered, the model 
can be identified introducing the restrictions g = 0.0 and a = 1.0 and the remaining free 
parameters are the vectors of item parameters a. and /3. The log-likelihood function of the 
parameters £ = (a', ($' ) can be written as 
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In L(£; X) = In Pr{x n \ £) , (2) 

n 

where x n is the response pattern of respondent n and X stands for the data matrix. To derive 
the MML estimation equations, it proves convenient to introduce the vector of derivatives 


r\ Q 

MO = lnPr(x n ,0 n ;O = ^ [In Pr(x 71 


M ot, 0)+ In <r)] 


(3) 


with 


mi 

Pr(x n I O n ,a,0) -nnw* • w 

i 9=0 

Adopting an identity by Louis (1982, also see, Glas, 1992), the first order derivatives of (2) 
with respect to £ can be written as 


MO = ^lnL(0*) = £ E (MO I *»,€). (5) 

This identity gready simplifies the derivation of the likelihood equations. For instance, it can 
be easily verified that the elements of b n (£) are given by 


bn{&ig) — &n{ x nig nig ) 


( 6 ) 


and 


bn(( 3 ig) — Ip nig %nig) 


(7) 


O 

ERIC 


where is a short-hand notation for ^ ip (0 n )- Combining these two expressions with (5), the 
likelihood equations for the item parameters are given by 
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^ E{@n x nig | £) — ^ ■^{^n'4 ) nig I £) (8) 

n n 

and 

^ Xnig = Y^ EWnig I ®n, 0 • ( 9 ) 

n n 

For LM statistics, also the second order derivatives of the log-likelihood function are 
needed. It will prove convenient to define 


h(€,0 = - 


d 2 lnL(£;X) 

dZdi 


( 10 ) 


As with the derivation of the estimation equations, also for the derivation of the matrix of 
second order derivatives the theory by Louis (1 982) can be used, and it follows that the observed 
information matrix, evaluated using MML estimates, is given by 


H( €,€)-- y>(B n ( t €) I X n , C) - B(6n(0*>n(€)' I *», €)], (11) 

n 


where 


B„( €, 0 = 


3 2 lnPr(a: n ,0 n ; 0 

a*a$' 


(12) 


Notice that the expressions for the second of the two right-hand terms of (11) can be directly 
derived from (6) and (7) , the expressions for evaluating B n (£,£) are found upon taking 
derivatives of these two expressions. The exact expressions for (11) can also be found in Glas 
(1998). 

For the 2-PL, the NRM, and the GPCM, the exact expressions for the second order 
derivatives are still tractable, but for more complicated models, using (11) may become rather 
complicated. A solution to this problem may be using the Fischer information matrix, 
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H{ i £) « Y, S (M£) I WMO I *«> .O', 03) 

n 

(also, see Mislevy (1986)). Below it will be shown by numerical examples that this 
approximation proves satisfactory for computing LM tests. 

Lagrange multiplier tests 

The principle of the LM test (Aitchison & Silvey, 1958), and the equivalent efficient- 
score test (Rao, 1947) can be summarized as follows. Consider a null -hypothesis about a model 
with parameters 0 O . This model is a special case of a general model with parameters 0. In 
the present, case the special model is derived from the general model by fixing one or more 
parameters to known constants. Let 0 O partitioned as <p f 0 = (0qi, 0O 2 ) = (0O1> C ')> where C 
is the vector of the postulated constants. Let h(<p) be the partial derivatives of the log-likelihood 
of the general model, so h{<j>) = 9 In L(0)/90 . This vector of partial derivatives gauges the 
change of the log-likelihood as a function of local changes in 0. Let H (0, 0) be defined as 
-d 2 In L(0) /<90<90'. Then the LM statistic is given by 


LM = fc‘(0 o )'H(0 o, 0 o )- 1 M0o)- ( 14 ) 

If this statistic is evaluated using the ML estimate of 0 O1 and the postulated values of c, it 
has an asymptotic ^-distribution with degrees of freedom equal to the number of parameters 
fixed (Aitchison & Silvey, 1958). An important computational aspect of the procedure is that 
at the point of the ML estimates 0 O1 the free parameters have a partial derivative equal to zero. 
Therefore, (14) can be computed as 


LM(c) = h(c)'W~ 1 h(c) 


( 15 ) 


W = -H- 2 2 (c,c) - H2l(c, 0oi)JEfn (001, 0oi) ^12(001. c), 


with ' 
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where the partitioning of if(0 o ,(/> o ) into if 2 2 (c,c), iT 2 i(c, <£ 0 i)> Hu(<j> 01 ,<j> 0l ), and 
if i 2 (<2> 0 u c ) is accordin g t0 * e partition <f>' 0 = (#oi» $») = c ')* 

Notice that H(</> ou 0 O 1 ) al so plays a role in the Newton-Raphson procedure for solving 
the estimation equations and in computation of the observed information matrix. So its inverse 
will usually by available at the end of the estimation procedure. Further, if the validity of the 
model of the null-hypothesis is tested against various alternative models, the computational 
work is reduced by the fact that the inverse of H (<£ 01 , <£ 01 ) is already available and the order 
of W is equal to the number of parameters fixed. It is advisable to keep the number of 
fixed parameters small to keep the interpretation of the outcome of the test tractable. This 
interpretation is supported by observing that the value of (15) depends on the magnitude of 
h(c), that is, on the first order derivatives with respect to the parameters 0 O2 evaluated in c. 
If the absolute values of these derivatives are large, the fixed parameters are bound to change 
once they are set free, and the test is significant, that is, the special model is rejected. If the 
absolute values of these derivatives are small, the fixed parameters will probably show little 
change should they be set free, that is, the values at which these parameters are fixed in the 
special model are adequate and the test is not significant. 

Besides a test of significance, this approach also provides information with respect to 
the direction in which the fixed parameters will change when set free. This is done by computing 
a new value of the fixed parameters, say 0J 2 , by performing one Newton-Raphson step, that is, 


<l>* 2 =c+W- l h(c). (16) 

t 

Below, this new value will be called a modification index. The covariance matrix of 0 q 2 
can be approximated by W . Assuming asymptotic normality of the estimates, it can then be 
tested whether 0 q 2 significantly differs from c, which boils down to performing the Rao (1947) 
efficient score test. 


Evaluation of the Fit of Item Characteristic Curves 

For dichotomous items, Lord (1980, pp.46-49) has pointed out that the expected 
number right score XZ $n{8) and ability 6 are the same things expressed on different scales of 
measurement. The important difference is that the measurement scale of the expected number 
right score depends on the test, while the measurement scale of 9 is independent of the items 
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in the test. For polytomous items, the situation is more complicated, in fact, Hemker, Sijtsma, 
Molenaar and Junker (1996) have shown that the unweighted sum score does not necessarily 
have a monotone likelihood ratio in 9. However, usually the unweighted sum score and the 
associated estimate of 9 will highly correlate. 

The idea of the LM test and modification index presented here will be to partition 
the latent ability continuum into a number of segments, and to evaluate whether an item’s ICC 
conforms the form predicted by the null-model in each of these segments. However, to be able to 
properly define an LM statistic, the actual partitioning will take place on the observed total score 
scale rather than on the 6 scale. As already mentioned above, the LM tests and modification 
indices developed here focus on specific items. So let the item of interest be labeled i, while 
the other items are labeled j = 1,2, ..., i - l,t+ 1, K. Let be a response pattern without 
item i, and let r(x$) be the unweighted sum score on this response pattern, that is, 


>•(*!?) =£ 2 >*»»- (> 7 > 

9 

The possible scores r(x$) will be partitioned into S, disjoint subsets; the index i signifies 

that this partition may be different for every item i. Consider the ordered boundary scores 

) 

r 0 < n < r 2 , ..., < r s <, ..., < r Si , with r 0 = 0 and r Si = Ylj-u m i- Further, define 



if r s _ i <= r(x$) < r 3 , 
otherwise , 


(18) 


so w(s, xif 1 ) is an indicator function which assumes a value equal to one if the unweighted sum 
score of response pattern xft is in score range s. Because a partition of the score range also 
induces a partition of the sample of respondents, the term sub-sample will be used to signify 
groups of respondents with a sum score in a certain subset of the score range. The choice of the 
number of subsets S, and the choice of the boundary scores will be returned to below. 

First, the case of the 2-PL and the NRM will be considered, generalization to the 
GPCM will be sketched at the end of this section. The essence of the approach is introducing 
an alternative model with discrimination parameters a ig -t- r y igs and j3 ig + 6 igs . Consider a model 
where the probability of scoring in category g of item i, conditional on w(s. x)i ! ), is given by 
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OC exp((c*j 9 + 7ig s )0 n — (Pig + Sigs)), 


(19) 


for g = 0, 1, Under the null-model, that is, the 2-PL model or NRM, 7 igs and S igs 

will be equal to zero. In the alternative model, 7 igs and S igs are free parameters, which gauge 
the deviation of the discrimination and difficulty parameters in the sub-groups from the values 
a ig and 0 ig . Some restrictions need to be imposed to identify this model. For instance, the 
restrictions 7 i05 = 6io s = 0 are imposed in addition to the usual restrictions a i0 = 0 iO = 0 to 
identify (19) for fixed s. Further, the complete set of Si probabilities (19) can be identified using 
the restrictions 7^5. = <5 ip s { = 0. Under this parametrization, a ig and 0 ig are the discrimination 
and difficulty parameters of item i in subgroup Si and 7 igs and 6 igs , 5 = 1 , Si — 1 are the 
deviations from this baseline in the other subgroups. An alternative to this parametrization will 
be considered in the section where a numerical example will be given. 

The probability of a response pattern x n is given by 


where x ni stands for the response on item i. Derivation of the LM test proceeds as follows. 
Let r] igs (8 n ) be abbreviated 7] nigs . For respondents n with a sum score in category 5, that is, 


Pr{x n | 8 n , a, (3,^60 = 




( 20 ) 


u>(s, 1?) = 1, it holds that 


^nipfigs) 8n{x n i g Vnigs ) 


( 21 ) 


and 
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bn{fiigs) — Vnigs X nig', (22) 

so the elements of the vectors of first order derivatives 311(1 M^») 316 given by 

\ 

Yl ^(bniligs) I ®«l ( 23 ) 

n|u;(s,x^)=l 


and 


(24) 

n|ty(5,Xn^)=l 

Notice that from inspection of (22) and (24), it follows that the h(6 igs ) is the difference of the 
observed number of persons of sub-sample s scoring in category g of item i , and its expected 
value. A the test for the simultaneous hypothesis 7 iff5 = 0 and 6i gs = 0, for 5=1 , Si — 1 
and g = 1, ...,rai, can be based on a statistic LM(7 i5 £J, which is defined by (15) with 
<pQ 2 = d = (7^, 6'). When LM( 7 i? £•) is evaluated using MML estimates of the null-model, 
that is, with MML estimates of £ and with 7 i = 0, and = 0, LM (7j, 6 { ) has a asymptotic x 2 ’ 
distribution with 2m* (S* — 1) degrees of freedom. It is also possible to define separate tests for 
the hypothesis 7 igs = 0, for s = 1 , Si — 1 and g = 1, ..., m*, and the hypothesis 6 igs = 0, for 
s = 1, ..., Si - 1 and g = 1, ..., m,. The first test, say LM{ 7^), can be based on the first order 
derivatives h( 7^. This statistic LM(7 t ) has an asymptotic x 2 -distribution with - 1) 
degrees of freedom. In the same manner, a test based on a statistic LM (7^) can be defined for 
the hypothesis 6i gs = 0, for s = 1 , Si — 1 and g = 1, , m*, which also has an asymptotic 
distribution with rrii(Si — 1) degrees of freedom. 


Insert Table 1 about here 


The exact expressions for the matrices of second order derivatives needed for 
evaluating (15) in the present case are found as follows. In the previous section, it was shown 
that the observed information matrix for the null-model, H n(<£ 01 . <£oi) » with ^01 = £ can 
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be derived using (11). This identity can also be used for deriving i?22( c > c ) and i?2i (c, <t> oi)> 
with c' = (7', 6'). In Table 1, expressions are given for B n (<f> a) (f) b ), where <f> a is equal to 
7 igs 9 ^igsi or <5^, and <p b is equal to 7 7i/i$> /3^, or /3^. It is 

easily verified that elements B n ( 7i ps ,7;/ l t)> £n(<5i P s, nfiht) B n (6 igs , <5^t) are equal to zero 

if 5 ^ t Further, if i ± j, B n ( 'y igs ,'y j h s ) = 0. B n (S ig3 ,6 jhs ) = 0 and B n ( j igsi 6 jhs ) = 0. 
The expression for Ff22(c,c) can now be derived applying (11). For instance, the elements 
E(B n (7ig S ,7 ihs ) \ i,6i) and£(6 n (7 ips )6 n (7^J | aj n) {,7t) S i) must be summed over 

all respondents with uj(s, x n ) = 1. The expressions for #21 (c, </> 0 i) 810 computed in a similar 
manner. 

Similar tests can also be derived for the GPCM (Muraki, 1992). In this model, eveiy 
item i has but one discrimination parameter a*. Therefore, the shape of the ICC’s are evaluated 
introducing dj + 7 is and /3 ig + 6 igs and testing the null- hypotheses 7 is = 0, 8 igs = 0, or 
both = 0 and 8 igs = 0, for s = 1, ..., Si - 1 and g = 1, ...,771*. Again, these tests can be 
based on statistics , LM(6i) and LM( 7*, 6 { ) , which have Si - 1 , mi(Si - 1) and 

(rrn + l)(5j — 1) degrees of freedom, respectively. Since the GPCM is derived from the NRM 
by introducing the linear restrictions a ig = gai , the matrix Hn(<f> 0l) 0 O1 ) for the GPCM can 
derived from the equivalent matrix for the NRM by pre- and post- multiplying the latter with 
the matrix of these linear restrictions and its transpose, respectively. 

The definitions of fT 2 2(c, c) and H 2 i{c, 0 O1 ) are changed accordingly. 

Evaluation of Local Stochastic Independence 

Evaluation of local independence will be based on alternative models which are 
generalizations of models proposed by Kelderman (1984) and Jannarone (1986) in the 
framework of the Rasch model. To grasp the flavor of these models, they will be presented 
here for dichotomous items first. Let item i and item j be two items where the responses are 
dependent. Consider a model given by 


Pr{ Xi,Xj | 6, an, a h (3 it /3 ; , 7 i; -, 6 tj ) oc 


exp[xi(M - ft) + Xj(aj6 - fy) + XiXj{ 7y# - 6y)] , 


(25) 


where x, and Xj take the values 0 or 1. In Table 2, the probabilities of the combinations of 
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Xi and Xj are cross-tabulated. From inspection of this table, it can be seen that 7 ^ and <5^ are 
parameters modeling the association between the two items. First consider a model without 7 ^, 
which is the 2-PL model version of the Kelderman (1984) model. 


Insert Table 2 about here 


In this model 6ij represents the addition the item difficulty parameters and (3 j to account for 
the probability of a simultaneous correct response to the two items. In the generalization of the 
Jannarone (1986) model to the 2-PL model, besides an additional location parameter S ij9 also 
an additional discrimination parameter 7 ^ is added. This parameter accounts for interaction 
between the probability of a simultaneous correct response to the two items and the ability 
dimension 6. 

This approach to modeling dependence between item responses can be generalized 
farther to polytomous items by adding the appropriate number of rows and columns to the 
cross-tabulation of Table 2 and adding the parameters needed to model the additional row and 
column effects. As a result, the model for a simultaneous response to item i and item j becomes 


Cnigjh — P^i^nig — Ij X n jh — I | j j ^Yij^ 

oc exp[(a i5 0 n - 0 ig ) + (a jh 0 n - 0 jh ) + (j ig]h Qn - 6» SJ -h)], (26) 


where jh = 6 igj h = 0, if either g = 0 or h = 0. The probability of a response pattern changes 
from (4) to 


mi TTijt 

Pr(x n I e n , a, 0 , = II II C** ft II ■ < 27 ) 

g — 0 h—0 tyhj fc =0 

LM tests and modification indices for assessing lack of local dependence can be based on 
derivatives of the log-likelihood- with respect to 7 igjh and 6 ig jh, evaluated under the null-model 
where 7 igjh = 0 and 6 igj h = 0. Finding these derivatives, denoted h(j i:} ) again 

proceeds using expression (5). So inserting 
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^nirfigjh) @ nip'nig^'njh Cnigjh) 


(28) 


and 



(29) 


into (5) produces the desired expressions. Notice that the h(6 ig jh) is the difference 
between observing simultaneous responses x nig x njh and its expected value Cnigjh I 

In the same manner, the expression for h( r r igjh ) is the difference between 

3'nig3'nihB(@n | 7tj 5 ^ij) ^nd -^(CmpjTi^n I 7i; > 

A test for the composite null-hypothesis S igjh = 0, and ^y igjh = 0, for g = 1, rrii 
and h = 1, can be based on LM{ 7 i; - 3 &ij), which is defined by (15) with </>o 2 = c! = 

( 7 6 ^ ). When this statistic is evaluated using MML estimates of the null-model, it has an 
asymptotic x 2 - distribution with 2rriimj degrees of freedom. 

The matrix of weights W defined in (15), can again be found using (11). Therefore, 
expressions for B n (<fi a , <f> b ) are needed, where <fi a and <j> h are 7 igjh and Sigjh or a parameter of 
the null-model. The needed expressions are tabulated in Table 3, they easily follow from taking 
derivatives of (28) and (29). 


As in the previous section, also here special tests can defined for the hypothesis 


tests, denoted LM( 7 ij ) and LM(^j) are defined by (15) withe = 7 ^ andc = 6ij, respectively. 
They both have rnirrij degrees of freedom. Analogous to the previous section, tests for the 
GPCM can be defined as special cases of tests for the NRM. These tests, demoted LM( 7 ^), 
LM(6ij) and LM{^ 6 i5 ) have one, rnirrij and 2m i m j degrees of freedom, respectively. 


Insert Table 3 about here 


j igjh — 0, g = 1, and h = 1 , ...,mj, and 6 igjh = 0, <7 = 1 , rm and h = 1, These 


Modification Indices in a Bayes Modal Framework 


It is well-known that item parameter estimates in the 2-PL model (and the 3-PL model, 


Modification indices - 14 

which is beyond the scope of the present paper) are sometimes hard to obtain, because the 
parameters are poorly determined by the available data, in the sense that in the region of the 
ability scale where the respondents are located, the ICC’s can be appropriately described by 
a large number of sets of item parameter values. To obtain "reasonable" and finite estimates, 
Mislevy ( 1986 ) considers a number of Bayesian approaches, entailing the introduction of prior 
distributions on the parameters. In the present section, it will be shown how the LM tests and 
modification indices presented above can accommodate these assumptions. In particular, two 
approaches will be studied, in the first approach the prior distribution is fixed, in the second 
approach, often labeled an empirical Bayes approach, the parameters of the prior distribution 
are estimated along with the other parameters. Let p{ £ \ 77) be the prior density of the £, 
£' = (a', / 3 '), characterized by parameters 77, which in turn follow a density p{r]). In a Bayes 
model framework, parameters estimates are computed by maximizing the posterior density of 
£ , which is proportional to In L( £; X) + lnp( £ | 77) + lnp(77). 

First, the prior distribution of £ will be considered known. Let d( £) = dlnp( £ | 
•q) id £ and D{ £, £) = — d 2 lnp( £ | r))/d £3 £'. The first order derivatives of the posterior 
with respect to £, say h " ( £), are given by h*( £) = h( £) + d( £), where h( £) is defined 
in ( 5 ), and the Bayes modal estimates are found upon solving h m ( £) = 0 . The opposite of 
the second order derivatives of the posterior with respect to £, say H"{ £.£), are given by 
H'{ £, £) = H( £, £) + D{ £, £), where H{ £,£) is defined in ( 10 ). Substituting H'{ £, £) 
for H u ( £, £) in the above LM statistics defines the comparable statistics for a Bayes modal 
framework with a fixed prior. 

In an empirical Bayes framework, the parameters 77, are estimated. Consider the 
definitions of Section 3 . The parameter vector <p 0 was partitioned (<£' 01 In the present 
context, 4> 01 is the concatenation (£ / , rj') and <p 02 can be y, , < 5 t , or their concatenation, or 
•y.. , 6 ij or their concatenation, all depending on the hypothesis considered. The first order 
derivatives of the posterior distribution are given by h*(<p 0 ) = < 91 n L( £; X)/d(p 0 +< 91 np( £ | 
77) /d<p 0 + d In p(rj) /d<j> 0 , which will be written as 

h’(<p 0 ) = h(<p 0) + d(<t> 0) + g{<t> 0). So empirical Bayes modal estimation entails 
solving h* (£)= 0 and h*( 77) =0, that is, h*(£) = h( £)+d( £) = 0andh*( 77) = d{ 77)4-9(77) 
= 0 . 

The opposite matrix of second order derivatives will be partitioned 



H’ u (£,£) ^(£,77) 

H * u (77,£) *T u (t7,t7) 

17 


H'(4> 0 ,<Po) = 


02) ' 
^02) y 
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Let H(4> 0 , <t> 0 ) = -d 2 \nLU-X)/d(t> 0 dct>' 0 , D(4> 0 ,<p 0 ) = -d 2 ]np(£ \ rj)/dct> 0 d<f>' 0 , and 
<2(</> 0 , </>o) = -d 2 fop{ri)/d<f>od<f>o- Then the opposite matrix of second order derivatives 
becomes 


Replacing H(<p 0 , <P 0 ) ' m the above statistics by H*(<p 0 , <f> 0 ) gives the definitions the equivalent 
statistics for the empirical Bayes modal framework. In the numerical examples given below, 
for dichotomous items, a fixed normal prior on the logarithm of with parameters Mina? a ina 
will be used. For the empirical Bayes example, the natural conjugate prior for the normal 
distribution will be used, which is normal for ^ lna given cr\ na and inverted Wishart for cr\ na 
(Ando & Kaufman, 1965). For details on this procedure one is referred to Mislevy (1986). 


Above, for reasons of simplicity, it was assumed that all respondents were drawn from 
the same population and responded to the same set of items. Generalization to a situation where 
this is not the case proceeds as follows. Firstly, it will be assumed that Q populations have 
normal ability distributions indexed by fi q and a q , q = 1, Q. Further, q(n) is the population 
to which respondent n belongs. To identify the model, the first ability distribution will be fixed 
to standard normal, and the definition of the vector of free model parameters £ is now extended 
to £' = (a',/3 / ,^ 2 ,a2, Secondly, a missing data indicator z n will be introduced. 
This vector has elements z™ equal to one if a response of person n to item i is observed, and 
zero otherwise. In the present context, it will be assumed that the ignorability principle by 
Rubin (1976) holds, that is, the missing data indicator does not depend on the unobserved 
responses. As a consequence, parameters can be estimated using a likelihood function or a 
posterior distribution that is conditional on the value of the missing data indicator. Therefore, 
(4) and (5) now become 


ffn(£, £) + £>(£,£) 

= D(rf, £) 

[ ^ 21 ( 002,0 


B&V) 

D(l7,7?) +G(t?,7?) 
0 


ffu(C,*02) 


0 


^22(^025 ^02) , 


Multiple Populations and Incomplete Designs 




+ 1 ns(0„; 0\,(n))] • 


(30) 
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The likelihood equations for the population parameters are derived upon observing that 


bn{fA q (n)) {@n ^g(n))^g(n) 


(31) 


and 


6n(0-,(n)) = -VqL + W» ~ ^<K»))V 


9 (n)> 


(32) 


Again using (5), first order derivatives can be derived, and estimation equations are given by 




= E E(e n \x n ,z n , 0 


(33) 


Q ri\n(g)=q 


and 


= F E E(6l\x n ,z ni £)-rt, (34) 

q n|n( 9 )=g 

where N q is the number of respondents in the sample of population q. First order derivatives for 
item parameters are derived from (8) and (9) by replacing the summations in these equations 
by summations over respondents n with z n i equal to one, that is, the estimation equation for the 
parameters of item i only depends persons who have actually responded to item i. In the same 
manner, expressions for first order derivatives can be derived for item parameters of alternative 
models and for second order derivatives of item parameters. 

Imputing these generalized definitions of h(<t>) and 4>) into the definitions of 
the tests for local independence, LM( 7 i7 -, 6y), LM ( 7 ^) and LM(6ij) , results in the statistics 
which can also be applied in the framework of multiple populations and incomplete designs. 
For the tests for the shape of the ICC’s, LM(7 i5 ^i) , and LM(6i ), some additional 

provisions need to be made. This has to do with the fact that the definition of the alternative 
model depends on the respondents sum scores r(xi l) ), which, in turn, depend on the partial 
response patterns Xn\ However, when every person responds to a unique set of items, setting 
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boundary scores for partitioning the score continuum becomes quite difficult, because the 
rationale of the procedure is that respondents grouped together should be approximately located 
in the same region of the latent ability space. Choosing boundary scores related to the proportion 
of correct responses is very crude, because the proportion of correct scores not only depends 
on the persons’ ability, but also on the difficulty of the items. Partitioning the latent ability 
continuum and then deriving boundary scores for the respondents is extremely laborious and 
completely undermines the philosophy of the approach. Therefore, application of these statistics 
must be confined to designs were the sample of respondents is split up into a number of groups 
of respondents who were administered the same set of items. The sets of items are often called 
booklets. Then, for every booklet, boundary scores are set in such a way that resulting score 
ranges roughly reflect comparable ability levels across booklets. This approach is the same as 
the approach of the comparable S^-test for the Rasch model (Glas & \brhelst, 1995). 

A Numerical Example 

The aim of this section is to give an example of the use of LM tests and modification 
indices using real data. The data are a completely random sample of the data emanating from the 
central national examinations in secondary education in the Netherlands in 1 995 . The items used 
are from a test concerning reading comprehension in English. To keep the presentation compact, 
only the first 1 0 items of this examination will be used. However, the results did prove typical for 
the complete examination. In Table 4, an overview of the data and the MML estimates are given. 
The second and third column, labeled "p-value" and "rif ’ contain the observed proportion correct 
scores for the items and the item-test correlations. The frequency distribution of the respondents 
unweighted sum scores is displayed in the last column. The remaining columns contain MML 
estimates of the parameters and estimates oftheir standard errors. The columns labeled n £e*(.)" 
contain standard errors computed using the observed information matrix, given by (11), the 
columns labeled "£e(.)" contain standard errors computed using the Fischer information matrix, 
given by (13). It can be seen that these two estimates of the standard errors are very close indeed. 


Insert Table 4 about here 


Next, for these 10 items, LM statistics were computed, an overview is given in Table 
5. In this example, for every item, the score range was divided into four sections. There are 
several considerations pertaining to the choice of the number of subsets £* of the score range 
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and the choice of the boundary scores. Generally speaking, the number of score groups will 
depend on the number of items and the number of respondents available. Inspection of (22) 
and (24) reveals that, for polytomous items, the first order derivatives h(6i) are the difference 
between the number of persons obtaining an unweighted sum score in category 5 and scoring 
in category g of item i 9 ICni^xg))*! x ™9 ex P ecte d value. 


Insert Table 5 about here 


For dichotomous items, this boils down to the difference between the number of persons in s 
making the item correct and its expected value. Therefore, it may be a good strategy to form 
the subgroups in such a way that the observed and expected frequencies are not to low, which 
can be supported by setting the boundary scores in such a way that the numbers of respondents 
in each subgroup are comparable. As a side line, it must be mentioned that the fact that the 
magnitude of h(6i) depends on a difference between observed and expected frequencies will 
be helpful in assessing the severity of the model violations. Due to a large sample size, 6i may 
differ significantly from zero, yet the severity of the violation in terms of a difference between 
the observed and expected frequencies may be insignificant from a practical point of view. 

The columns labeled contain values for LM statistics computed using exact 

expressions for the matrix of second order derivatives. Since four subgroups were formed, 
LM* (7*,<5t) has 6 degrees of freedom and LM*( 7 J and LM*(6i) both have 3 degrees of 
freedom. To keep the presentation concise, association between items was evaluated for 
consecutive items only, the results are displayed in the second panel of Table 5. Here, 
LM*( 'YijiSij) has 2 degrees of freedom and LM*( 7 ^) and LM*(6ij) both have one degree 
of freedom. Finally, the columns of Table 5 labeled "LM(6ij)" and "LM(8i)" contain LM 
statistics computed using the Fischer information matrix. Inspection shows that the values of 
these statistics are very close to the values obtained using the exact expressions for the second 
order derivatives. This result was typical for all analyses made in this numerical example. 
Therefore, in this section no further comparisons between the two approaches will be presented. 

Although the primary aim of the tests presented here is to serve as item-oriented 
diagnostic tools, they also serve the purpose of evaluation of global model fit, especially if 
the number of items is large. Consider the ten significance probabilities of the LM( 7 ,, 6 { ) 
test displayed in the third column of Table 5. Under the null-model, that is, under the 2 -PL 
model, these ten significance probabilities should have an approximate uniform distribution. 
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Of course, this is only an approximation, because these 10 statistics are dependent. For reasons 
of dependence, one should not combine the significance probabilities of the LM*( 7^,^)- , 
the LM* (7*j)- and -statistics, because the dependence between these three statistics 

is too prominent. This can, for instance, be verified by inspection of Table 5. The same line of 
argument also applies to the three tests focussed at dependence between the items. Although 
the requirement of independence is also not fulfilled within one LM test replicated over items, 
here the dependence is far less prominent, and a fair approximation to the uniform distribution 
favors the model. On the other hand, a majority of low significance probabilities is indicative 
for global model violation. If, for instance, a significance level of 10% is used, a percentage of 
significant tests that greatly exceeds 10% is an indication of poor global model fit. 


Insert Table 6 about here 


The next interesting question is whether the results of Table 5 are much different in 
a Bayes model framework. Two analyses were made, one analysis where the discrimination 
parameters are assumed to be drawn from a known log-normal distribution, and an empirical 
Bayes analysis where the conjugate prior is introduced to this log-normal distribution. In the 
first case, the parameters of the log-normal distribution are fixed to ^ ln7 = .0 and <Ji n7 = 0.5. 
The reason here is that these are the default values in Bilog-MG (Zimowski, Muraki, Mislevy, 
& Bock, R.D., 1996), which will probably be the software mostly used by practitioners. Above, 
it was already mentioned that the conjugate prior for the normal distribution is normal for fi lny 
given <7i n7 and inverted Wishart for cr^y (Ando & Kauftnan, 1965) Using the terminology of 
Mislevy (1986), for this last distribution the parameters m = 5 and 6=1 were chosen. The 
results of computation of the LM statistics are shown in Table 6. Generally, the pattern of 
significant indices remains the same. For instance, using a 10% significance level, for all three 
analyses, item 7 has a significant LM( 7 i? <5*) and LM(<5*). In the same manner, the item pair 
3 and 4 has a significant LM(7^, 6%) and LM( 7^) and LM(<5^)-test in all three analyses. 
However, sometimes the pattern changes. For instance, the significant LM( 7^ <5i)-test for item 
3 disappears in the Bayesian analyses and a significant LM (7^ , <5^) -test for the item pair 2 and 
3 appears in the empirical Bayes analysis. It must be mentioned that such changes occurred less 
when the number of items in the analysis was higher. 


Insert Figure 1 about here 


O 

tKJC 


22 


Modification indices - 20 


Insert Table 7 about here 


In Section 3, it was sketched that using (16) an estimate of a freed fixed parameter 
can be computed by performing one Newton-Raphson step. Standard errors of these one -step 
estimates can be computed using the diagonal elements of W. In Section 4, the alternative 
model was identified by imposing y igS . = 6 ig Si = 0* Therefore, for s = 1, — 1 

, the parameters 7 igs and 8 igs can be viewed as the deviations from the discrimination and 
difficulty parameter of group S i9 respectively However, in practice it proves more elegant 
to have confidence intervals for all Si score levels. Therefore, the MML estimates of and 
P ig will be imputed in the alternative model as a fixed constants, so that the parameters 7 ips 
and 6 igs can be viewed as the deviations from these estimates for all groups s = 1 
This alternative parametrization entails that, for the computation of LM tests and modification 
indices, the elements of h{£) and JET(£,£) associated with a ig and (3 ig should be removed. 
Because this is just a simple reparametrization of the alternative model, this operation does not 
alter the outcome of the LM test. 

In Table 7, one-step estimates are computed for the first two items, the results are 
displayed under the heading ’’Modification Indices’’ in Table 7. Assuming asymptotic normality 
of these estimates, they can be transformed into standardized normal indices. An example 
using the two items of Table 7 is shown in Figure 1. The circles signify standardized one-step 
estimates of 7 *, the triangles the standardized one-step estimates of <5j. Using these displays, 
the locus of miss-fit can be identified at a glace. For instance, the lack of fit of the second 
item is mainly due to the low score level. Of course, an interesting question is how much the 
freed parameters will change if new MML estimates are computed, both for the parameters of 
the initial 2-PL model and the parameters of interest. In Table 7, these estimates are displayed 
under the heading ’’Parameter Estimates”. It can be seen that these estimates are little different 
from the estimates under the heading ’’Modification Indices”. The parameters of item 2 in group 
4 seem an exception, but this appearant effect vanishes when the estimates are standardized by 
their standard error. The fact that new MML estimates were computed for all parameters in 
the model supports computing a likelihood ratio test. The log-likelihood of the original model 
equaled -12028.783, the model with additional parameters for item 1 resulted in -12027.109, 
the model with additional parameters for item 2 resulted in -12023.459. So, the LR-test for 
the first item has a value 3.338 (df= 6 , p=0.764), the test for the second item has a value 
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10.648 (df= 6 , p=0.100). This is in accordance with the other results, the first item seems 
to fit, the second might be called a borderline case. The strategy used here may serve as a 
prototype: first, compute LM tests and modification indices, which can be done quickly without 
additional estimation, then perform additional estimation for items where model fit appears to 
be troublesome, and, finally, relax the model in cases of serious violations. 

A Power Study 

In this section, an unassuming power study will be presented. It will in no way 
be exhaustive, because that would need a systematic variation of sample size, test length, 
parameter values and model violations which is far beyond the scope of this paper. The main 
purpose of the study was to get a general idea of the power of the LM tests. The arrangement 
of the study reported is quite arbitrary. However, the results are not significantly different 
from some other simulation studies that ware carried out. The sample size was equal to 1000 
respondents, 9 dichotomous items were used. The discrimination parameters 7 < were equal to 
(. 5 , . 5 , . 5 , 1 , 1 , 1 , 1 . 5, 1.5, 1.5). The difficulties 6 t were (-1,0, 1,-1, 0,1,- -1,0,1). The ability distribution 
was standard normal. The first collection of studies was focussed on the tests for the shape of 
the ICC’s, the second collection of studies was focussed on the tests for local independence. 


Insert Table 8 about here 


The results of the first collection of studies are reported in Table 8 . In these studies, the 
ICC of item 5 was contaminated by introducing parameters j is and 6 is , s = 1, 4. First, values 
were set for some parameter 7 ^ and <5 **, these values are shown in the third and fourth column 
of Table 8 under the labels " 7 im n and Using these values, two patterns of violations were 
created, ( 7 ^, -y i2 , 7 , 3 ) 7 i 4 > fa, < 5 , 2 , <5 <3 , <5< 4 ) was equal to (- 7 *, 7 «, 7«> -7,., -fa, fa, -fa, 
,) in the first version, and equal to ( 7 ^ , — 7 ^, — 7 7 <5*. , — <5,. , <5;., — <5,.) in the second 
version. For an example, consider Table 8 , where every row corresponds to a simulation study. 
Consider study 18. In the second column it can be seen that this study has the second pattern 
of violations, the third and fourth column display that 7 <„ = 0.50 and <5,. = 0.50, so here ( 7,1 > 
7 < 2 > 7 < 3 > 7 < 4 . < 5 ii)< 5 j 2 , <5 i3 , <5 <4 ) was equal to (-0.50, 0.50, 0.50, -0.50, -0.50, 0.50, -0.50, 0.50) . 
Using this setup, 100 replications were made for every row in Table 8 , for every replication 1000 
response patterns were generated, MML estimates were computed and L M ( 7 , , <5,)-, LM ( 7 *)- , 
and LM(6i)- tests were performed using a 1 0% significance level. The proportion of significant 
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tests is displayed in the last six columns of Table 8 , the tests in the columns labeled 
were computed using the observed information matrix, the tests in the columns labeled LM(.) 
were computed using the Fischer information matrix. The first row of Table 8 corresponds 
to the null-model, that is, to the 2-PL model, and it can be seen that the proportion of tests 
significant at 10% is approximately equal to 0.10, which is as it should be. Further, it can be 
seen that the proportions of significant tests are monotone increasing in and 6 which is 
also in accordance with the purpose of the tests. However, an interesting feature of the results is 
that all tests are sensitive to all violations, for instance, LM( 7^) is both sensitive to a violation 
= 0 and <5,* ^ 0, and a violation 7 {j> ^ 0 and <5 = 0. In fact, the power of LM(6i) to a 
violation 7 *, ^ 0 and = 0 is greater than the power of LM( 7 J . So it must be concluded 
that attribution of the outcome of the test to specific parameters will be quite difficult. This 
result must be attributed to the high correlation between estimates of the item discrimination 
and item difficulty parameters, so the reason for the poor discriminative power of the tests 
must be attributed to the properties of the 2- PL model, and not to the properties of the tests. 
Summing up, the tests must be used as caution indices, and one must not expect to be able to 
trace significant results back either the item discrimination or difficulty parameters. 


Insert Table 9 about here 


Table 9 contains the results of a comparable study to the power of LM{^ ijy 6ij) , 
LM( 7 y ) andLM(6ij). Again, the number of items is 9 and the number of respondents is 1000. 
Also the parameters of the 2- PL model were the same as in the previous study. In Table 9, every 
row of the table corresponds to a study. Association between items was induced by introducing 
additional parameters 7 and < 5 ^, in the second and third column it can be seen that one half of 
the studies the concerns association between item 1 and 5, the other half concerns association 
between item 5 and 8 . The values of 7 ^- and 6 {j are displayed in the next two columns. For 
every study, 100 replications were made and the proportion of LM tests significant at the 10% 
level was computed. The results are given in the last 6 rows of Table 9. As above, the tests in 
the columns labeled LM*(.) were computed using the observed information matrix, the tests 
in the columns labeled LM(.) were computed using the Fischer information matrix. Contrary 
to the above studies, the tests prove more discriminative with respect to the specific violation 
imposed. So LM( 7 ^) has substantial power for a violation 7 ^ •= 0 and Sij 7 ^ 0 , , while the 
power of LM( 7 ^) is low. Analogously, for a violation 7 ^ ^ 0 and 6^ = 0 the opposite applies. 
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Discussion 

In the present article, it was shown that LM tests and modification indices are a 
practical and useful tool for evaluation of model fit. Their practicality is a result of the 
circumstance that most of the ingredients needed are available at the end of the estimation 
procedure, so very little additional computations have to be made. They are usefulness because 
they are item oriented diagnostic tools, which give an indication of the source of model 
violations. Potentially, they offer the possibility of directed model relaxation to obtain sufficient 
model fit. On the other hand, the discriminative power of the approach must not be exaggerated. 
For instance, if the model is grossly violated, a sum score r(x$) on a partial response vector jc? 
may no longer be a valid indication of ability, so that the underpinning of the LM (7 *)-, LM ( 6 { )- 
and LM( 7 ^ 6 ;)-test becomes unrealistic. Further, the discriminative power of the tests is, of 
course, also limited by the characteristics of the model, for instance, the power study made 
apparent that the well-known dependence between a ig9 g — 1 and 0 ig , g = 1 , 

obstructed the attribution of model violations to either set of parameters. An advantageous 
aspect of some of the statistics is that they are based on a difference between observed and 
expected frequencies, so the importance of a significant model test can be assessed in a 
framework that is directly related to observed data. The approach presented here can obviously 
be extended in several directions. The first extension is tailoring the approach to the 3-PL model. 
Further, the model can also be extended to encompass models with multidimensional ability 
distributions. Finally, in many structural models on ability parameters, the item parameters 
estimates issued from a calibration phase are imputed into the structural model as known 
constants. Also evaluation of the validity of these imputed constants when confronted with 
the new data seems.another promising area where LM statistics and modification indices might 
be useful. 
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Expressions for B n (<j > a , <j>b) for the parameters of item i 
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“Pnihs “Pnigs 

kgs 

On ‘Pnigs'Pnihs 

Oni>nigs(l~ll>nigs) 

'tpnigs'lpnihs 

— — Ipnigs) 


o 

ERIC 


30 


Table 2 

Cross-tabulation of Probabilities 
Pr(x{ ,Xj | o* , atj , ft- , ft , Tij , ) « 



H 

II 

o 

Xi = 1 

z ,• = 0 

1 

exp(a»0-ft) 

X 3 = 

exp (otj9-0j) 

exp((a» + Otj + Tij )# ~ ft - ft + ftj) 
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Table 3 


Expressions for Bnf^g, for the parameters of item i 



7 ih 

Hg 

SiK 

dig 

7*93 h 

~&nCnigjh(l — Cnigjh) 

6 nCnigjhCnikjJ 

Qnigj h ( 1 Cnijj/i) 

~~@n Cnigj hCnikjl 

likjl 

Cn i kj Knigj h 

~ Cnikjl) 

Qnikj iQnigj h 

@nCnikjl(l Cnifcjff) 

d igj h 

@nCnigj Cnigjh ) 

~~~@n Cnigj h^nikjl 

~Cnigj h{l ~Cnigjh) 

Cnigj h^nikjl 

dikjt 

~ Cn t kj iCn » gj h 

@nCnikjl(l ~ Cntfcji) 

GnikjlGnigjh 

— Cnt jfcj /( 1 Cnifcjj) 

Otig 

— &n(nigjh(l “ 

"" XI /i Cni*;'/) 

QnCnigj h(l £/ Cn /$./'/) 

^nCni$ji h(l ~ XI* C nigjh) 

&ik 

^ nCnigjh Xlf Cnikjl 

^nCnilrj/ XI/! Cnifcjft) 

0nCni$jfh £/ C nikjl 

~@nCnikjt £/! Cnifcjfh) 

Pi, 

Qn£nigjh{l ~ XI/ Cnt^jf/) 

^nCniibji/(l “ XI*. Cni$ji/i) 

~Qnigjh{} ~ £/ n WO 

“ CniJbji(l — £h Cnt5;7i) 

Pa 

“^nCrn'^/i Xlf Cnikjl 

— ^nCntijii XI>. Gntkjh 

Cnicji /i Xli Cnikjl 

Cnikjl £fc Cnikjh 
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Table 4 


Data Summary and MML parameter estimation of 10 Examination Items 
Number of observations = 2039 


item 

p- value 

rit 

a* 

ft 

Se m (ai) 

5e*(ft) 

Se(Qi) 

Se(ft) 

score 

frequency 










0 

2 

1 

.40 

.36 

.30 

.40 

.071 

.047 

.071 

.047 

1 

7 

2 

.86 

.41 

1.28 

-2.31 

.176 

.145 

.169 

.138 

2 

23 

3 

.87 

.37 

.95 

-2.16 

.132 

.105 

.132 

.105 

3 

92 

4 

.49 

.41 

.50 

.06 

.075 

.047 

.077 

.047 

4 

175 

5 

.81 

.39 

.75 

-1.59 

.103 

.074 

.106 

.075 

5 

314 

6 

.57 

.42 

.59 

-.32 

.078 

.049 

.081 

.049 

6 

380 

7 

.66 

.39 

.53 

-.71 

.080 

.051 

.082 

.051 

7 

425 

* 8 

.63 

.47 

.85 

-.61 

.097 

.055 

.100 

.056 

8 

333 

9 

.62 

.40 

.49 

-.52 

.078 

.049 

.079 

.049 

9 

224 

10 

.56 

.43 

.63 

-.25 

.083 

.049 

.083 

.049 

10 

64 
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Table 5 

LM modification indices for 10 Examination Items 



item 

LM'(nA) 

P 

LM’( 7.) 

P 


P 

LM(6i) 

P 


1 

3.26 

.78 

.87 

.83 

1.36 

.72 

1.41 

.70 


2 

11.93 

.06 

12.64 

.01 

2 48 

.48 

2.58 

.46 


3 

13.29 

.04 

5.48 

.14 

.75 

.86 

1.41 

.70 


4 

2.88 

.82 

1.12 

.77 

.45 

.93 

.46 

.93 


5 

4.43 

.62 

2.29 

.52 

1.02 

.80 

.90 

.82 


6 

7.47 

.28 

2.47 

.48 

4.29 

.23 

5.00 

.17 


7 

11.62 

.07 

4.30 

.23 

9.20 

.03 

9.70 

.02 


8 

7.31 

.29 

3.63 

.30 

1.53 

.67 

1.52 

.68 


9 

11.10 

.09 

5.51 

.14 

6.10 

.11 

6.30 

.10 


10 

9.15 

.17 

6.56 

.09 

3.76 

.29 

4.23 

.24 

item i 

item j 


P 

) 

P 


P 

LM(6 tj ) 

P 

1 

2 

4.25 

.12 

3.39 

.07 

.62 

.43 

.64 

.42 

2 

3 

.46 

.80 

.44 

.51 

.41 

.52 

.41 

.52 

3 

4 

18.91 

.00 

4.93 

.03 

19.73 

.00 

18.69 

.00 

4 

5 

.85 

.65 

.80 

.37 

.31 

.58 

.31 

.58 

5 

6 

1.89 

.39 

1.74 

.19 

.18 

.67 

.18 

.67 

6 

7 

.89 

.64 

.35 

.55 

.27 

.61 

.26 

.61 

7 

8 

3.85 

.15 

3.59 

.06 

.16 

.69 

.16 

.69 

8 

9 

4.64 

.10 

.91 

.34 

2.40 

.12 

2.22 

.14 

9 

10 

2.41 

.30 

1.73 

.19 

.24 

.62 

.23 

.63 
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Table 6 


LM modification indices in a Bayesian Framework 


Statistics Computed Using Fixed Prior 


item 

LMfriA) 

P 

lmm 

P 

LM(6i) 

P' 


1 

4.99 

.55 

2.45 

.48 

1.97 

.58 


2 

11.83 

.07 

6.78 

.08 

2.56 

.47 


3 

4.94 

.55 

3.58 

.31 

1.28 

.73 


4 

1.63 

.95 

.64 

.89 

.64 

.89 


5 

3.31 

.77 

2.23 

.53 

1.14 

.77 


6 

8.37 

.21 

2.66 

.45 

5.12 

.16 


7 

11.88 

.06 

5.54 

.14 

10.82 

.01 


8 

5.30 

.51 

2.60 

.46 

1.58 

.66 


9 

11.73 

.07 

6.09 

.11 

7.21 

.07 


10 

9.91 

.13 

6.25 

.10 

4.70 

.20 

item i 

item j 

LM(y, ijAj) 

P 


P 


P 

1 

2 

1.59 

.45 

.20 

.66 

1.57 

.21 

2 

3 

.30 

.86 

.27 

.60 

.11 

.74 

3 

4 

18.49 

.00 

7.61 

.01 

17.78 

.00 

4 

5 

2.06 

.36 

2.03 

.15 

.13 

.72 

5 

6 

.58 

.75 

.53 

.46 

.27 

.61 

6 

7 

.09 

.96 

.01 

.92 

.08 

.77 

7 

8 

1.85 

.40 

1.79 

.18 

.38 

.54 

8 

9 

2.04 

.36 

.18 

.67 

1.50 

.22 

9 

10 

.42 

.81 

.31 

.58 

.06 

.81 


Statistics Computed Using Emperical Prior 



item 

LMi-uA) 

P 

LMM 

P 

LM(6i) 

P 


1 

15.22 

.02 

4.55 

.21 

2.24 

.52 


2 

8.47 

.21 

7.26 

.06 

3.90 

.27 


3 

8.25 

.22 

4.86 

.18 

1.49 

.68 


4 

1.21 

.98 

.41 

.94 

.67 

.88 


5 

4.08 

.67 

1.15 

.76 

1.26 

.74 


6 

7.68 

.26 

2.32 

.51 

5.27 

.15 


7 

11.59 

.07 

4.50 

.21 

10.67 

.01 


8 

6.79 

.34 

1.31 

.73 

1.37 

.71 


9 

11.31 

.08 

4.89 

.18 

7.52 

.06 


10 

9.50 

.15 

4.68 

.20 

4.80 

.19 

item i 

item j 

LM (7,j ,<5,j ) 

P 


P 


P 

1 

2 

1.33 

.51 

.01 

.93 

1.01 

.32 

2 

3 

11.37 

.00 

.48 

.49 

.94 

.33 

3 

4 

17.20 

.00 

5.75 

.02 

16.81 

.00 

4 

5 

1.55 

.46 

1.49 

.22 

.04 

.84 

5 

6 

.98 

.61 

.91 

.34 

, -44 

.51 

6 

7 

.09 

.96 

.00 

.98 

.08 

.78 

7 

8 

2.65 

.27 

2.62 

.11 

.32 

.57 

8 

9 

2.52 

.28 

.55 

.46 

1.44 

.23 

9 

10 

.70 

.70 

.58 

.45 

.05 

.83 
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Table 7 


Modification Indices and Parameter Estimation 




Modification Indices 


Parameter Estimates 


item 

group 

7 it 

Si. 

se(7.,) 

se(6 it ) 

7.i 

Si. 

se(y ia ) 


1 

1 

-.169 

.129 

.197 

.152 

-.159 

.124 

.193 

.153 


2 

.013 

-.039 

.669 

.099 

.013 

-.039 

.665 

.099 


3 

.230 

.190 

.894 

.357 

.243 

.198 

.955 

.407 


4 

-.413 

-.432 

.534 

.468 

-.408 

-.424 

.516 

.447 

2 

1 

-.625 

.642 

.301 

.290 

-.557 

.555 

.225 

.198 


2 

.412 

-.405 

1.865 

.721 

.588 

-.510 

2.148 

1.110 


3 

-.272 

.187 

1.889 

.254 

-.274 

.182 

1.387 

.211 


4 

1.909 

-.345 

3.432 

.870 

.391 

-.542 

3.860 

.399 
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Table 8 

Study of the Power of the Test for the Shape of the ICC 
100 Replications per Study 


study 

pattern 

7i* 

Si. 

iw(riA) 

LM(yi,6i) 

LM'{n) 


LM*(6i) 

LM(6i) 

0 

0 

.00 

.00 

TO 

.09 

.11 

TO 

TO 

TO 

1 

1 

TO 

.00 

.16 

.06 

.13 

.13 

.11 

TO 

2 


.25 

.00 

.29 

.36 

.17 

.23 

.32 

.33 

3 


.50 

.00 

.74 

.77 

,50 

.56 

.80 

.81 

4 

2 

.10 

.00 

.29 

.13 

.17 

.17 

.13 

.09 

5 


.25 

.00 

.30 

.19 

.25 

.17 

.30 

.26 

6 


.50 

.00 

.76 

.72 

.63 

.52 

.84 

.76 

7 

1 

.00 

.10 

.23 

.22 

.25 

.21 

.25 

.24 

8 


.00 

.25 

.78 

.78 

.52 

.58 

.76 

.81 . 

9 


.00 

.50 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

10 

2 

.00 

.10 

.31 

.23 

.25 

.23 

.32 

.26 

11 


.00 

.25 

.85 

.82 

.78 

.75 

.88 

.86 

12 


.00 

.50 

1.00 

1.00 

.97 

.97 

1.00 

1.00 

13 

1 

.10 

TO 

.32 

.39 

.27 

.28 

.34 

.37 

14 


.25 

.25 

.97 

.97 

.84 

.89 

.99 

.99 

15 


.50 

.50 

1.00 

1.00 

.99 

.99 

1.00 

1.00 

16 

2 

.10 

TO 

.40 

.37 

.34 

.30 

.47 

.44 

17 


.25 

.25 

.95 

.96 

.93 

.90 

1.00 

.99 

18 


.50 

.50 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 
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Table 9 

Study of the Power of the Test for Association between items 
100 Replications per Study 


study 

item i 

item j 

7 «j 

in LM'inifiu) 

LMi-mAi) 



LM'(6n) 

LM(iij) 

0 

0 

0 

.00 

.00 

.09 

.09 

.08 

.08 

.11 

.12 

1 

1 

5 

.05 

.00 

TO 

.10 

.10 

.10 

.13 

.14 

2 



.10 

.00 

.13 

.14 

.13 

.13 

.14 

.14 

3 



.25 

.00 

.24 

.22 

.22 

.21 

TO 

TO 

4 



.50 

.00 

.55 

.57 

.68 

.69 

.11 

.11 

5 

5 

8 

.05 

.00 

.15 

.16 

.13 

.12 

.07 

.08 

6 



.10 

.00 

.12 

.15 

.17 

.15 

.12 

.12 

7 



.25 

.00 

.21 

.18 

.19 

.18 

.13 

.13 

8 



.50 

.00 

.36 

.38 

.46 

.47 

.15 

.15 

9 

1 

5 

.00 

.05 

.13 

.15 

.06 

.13 

.14 

.15 

10 



.00 

.10 

.13 

.15 

.10 

.11 

.17 

.16 

11 



.00 

.25 

.35 

.39 

.13 

.12 

.49 

.49 

12 



.00 

.50 

.92 

.92 

.13 

.13 

.96 

.96 

13 

5 

8 

.00 

.05 

.13 

.16 

.06 

.06 

.17 

.19 

14 



.00 

.10 

.17 

.21 

.14 

.14 

.22 

.21 

15 



.00 

.25 

.35 

.38 

.09 

TO 

.38 

.42 

16 



.00 

.50 

.90 

.91 

.14 

.15 

.93 

.94 

13 

1 

5 

.05 

.05 

.11 

.11 

.11 

.12 

.09 

TO 

13 



.10 

.10 

.12 

.13 

.13 

.14 

.15 

.17 

14 



.25 

.25 

.57 

.58 

.40 

.39 

.59 

.60 

15 



.50 

.50 

.97 

.97 

.90 

.90 

.93 

.93 

16 

5 

8 

.05 

.05 

.11 

.11 

.07 

.07 

.11 

.13 

16 



.10 

.10 

.17 

.17 

.10 

.11 

.15 

.15 

17 



.25 

.25 

.41 

.41 

.17 

.18 

.44 

.47 

18 



.50 

.50 

.87 

.87 

.51 

.54 

.83 

.84 
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Figure 1. Graphic Display of the Efficient Score Test for Two Items. 
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